Retain H0 at CI 0.95, p-value 0.3577 and critical value 43.773 and chi2_stat 32.2124:
There is no relationship between CLASSES and DAY variables
Reject H0 at CI 0.95, p-value 0.0, critical value 7.8147 and chi2_stat 30.9777:
There is a relationship between CLASSES and MONTH variables
Retain H0 at CI 0.95, p-value 1.0 and critical value nan and chi2_stat 0.0:
There is no relationship between CLASSES and YEAR variables
Reject H0 at CI 0.95, p-value 0.0, critical value 28.8693 and chi2_stat 73.5415:
There is a relationship between CLASSES and TEMPERATURE variables
Reject H0 at CI 0.95, p-value 0.0348, critical value 80.2321 and chi2_stat 82.5022:
There is a relationship between CLASSES and RH variables
Retain H0 at CI 0.95, p-value 0.3124 and critical value 27.5871 and chi2_stat 19.2858:
There is no relationship between CLASSES and WS variables
Reject H0 at CI 0.95, p-value 0.0, critical value 53.3835 and chi2_stat 125.9923:
There is a relationship between CLASSES and RAIN variables
Reject H0 at CI 0.95, p-value 0.0003, critical value 203.6015 and chi2_stat 243.0:
There is a relationship between CLASSES and FFMC variables
Reject H0 at CI 0.95, p-value 0.0166, critical value 194.8825 and chi2_stat 204.9135:
There is a relationship between CLASSES and DMC variables
Reject H0 at CI 0.95, p-value 0.0211, critical value 229.6632 and chi2_stat 238.2561:
There is a relationship between CLASSES and DC variables
Reject H0 at CI 0.95, p-value 0.0, critical value 129.918 and chi2_stat 231.3436:
There is a relationship between CLASSES and ISI variables
Reject H0 at CI 0.95, p-value 0.0234, critical value 203.6015 and chi2_stat 210.8094:
There is a relationship between CLASSES and BUI variables
Reject H0 at CI 0.95, p-value 0.0, critical value 150.9894 and chi2_stat 220.636:
There is a relationship between CLASSES and FWI variables
Conclusions: Classification Problem
FWI is calculated from FFMC, DMC, ISI, BUI and these features have a very strong correlation reaching to 9 or more so decided to go with following strategy:
Selected Temperature, Rain, RH, DC and FWI and month
I understand standardization and some transformation like log(x+0.5) transformation is needed for FWI and DC
I trained the model with standardization & transformation, with only standardization, without standardization or transformation
Original features can't be accommodated because of high correlation. Since FWI is calculated from other features so including those features and excluding the FWI will only complicate the model and there will be no improvement in performance.
Performed feature engineering so that I can include other features. The generated features include: FWI/FFMC, (DMC/FWi)/ISI, FWI/BUI
I trained the model with these features following the same strategy i.e with and without transformation and with standardization & no transformation.
All the models were scoring (f1, roc_auc) and refited based on the roc_auc_score
The best performing models in terms of F1, roc_auc, precision, balanced accuracy, etc are the models with engineered features with standardization. However, the second-best model was with original features including DC and FWI without standardization and transformation.
Best Accuracy= 97.959% , F1 score= 97.128%, ROC_AUC_Score= 97.22%